AD*A043  494 


unclassified 


RICE  UNIV  HOUSTON  TEX  OEPT  OP  ELECTRICAL  EN6INEERIN0  F/9  6/4 

AN  algorithm  FOR  OPTIMAL  NONLINEAR  STRUCTURE  PRFSERVIN6  FEATURE— ETC (U) 
JUN  7?  S A STARKS,  R J OE  FIGUEIREOO  AF-AFOSR-2 777-75 

EC-TR-770S  AFOSR-TR-77-1141  NL 


h^l’OSBi-TBi-  77 


RICE  UNIVERSITY 


GEORGE  R.  BROWN  SCHOOL  OF  ENGINEERING 


DEPARTMENT  OF  ELECTRICAL  ENGINEERING 


HOUSTON,  TEXAS 


AOCfSSIlN  'ot 

NTIS  A‘;;tc  Sorlion  jC 

DDC  Oiifi  SeU.on  Ti 

UNA':'iouNCi  n 

n 

Jl'STriCA'IUN  . _ 

— 

♦'Y 

cojES 

1 Di‘  t A.'AIL.  jnd/  (' 

Sf'.CIAl 

AN  ALGORITHM  FOR  OPTIMAL  NONLINEAR 
STRUCTURE  PRESERVING  FEATURE  EXTRACTION 


D DC 


by 

Scott  A.  Starks*  and  Rui  J.  P.  de  Figueiredo  -i* 

*Department  of  Electrical  Engineering 

■^Department  of  Mathematical  Sciences  ^ r^nn  Hf? 

g— s f: T'D'. ' i- 

Rice  University,  Houston,  Texas  77001;  JUNE  197inV-j"-'" 

'lli 

TECHNICAL  REPORT  EE  7708  1,  ', 

a 


Note:  This  paper  is  to  be  presented  at  the  1977  IEEE  International  Symposium 
on  Information  Theory,  to  be  held  at  Cornell  University,  Ithaca,  N.Y. 
on  October  10  through  14,  1977. 


ABSTRACT  I 

This  paper  presents  a new  approach  to  nonlinear 
structure  preserving  feature  extraction.  The  ideas  behind 
this  method  were  first  introduced  in  (l).  This  method  is  , 
based  on  certain  graph  theoretical  considerations  (such  as 
the  minimal  spanning  tree,  edge  inconsistency,  and  diameter 
edges)  and  topologicEil  considerations  (such  as  interpoint 
distance  measures;*  After  introduction  of  the  subject 
matter  and  appropriate  background  material,  the  algorithm 
is  formulated  in  section  5» 

Numerical  results  from  the  application  of  this  algo- 
rithm to  various  test  data  sets  are  presented.  Evaluation 
of  these  test  results  are  quite  encouraging. 


This  work  was  supported  by  Air  Force  Office  of  Scientific 
Research  under  the  Grant  75-2777 
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1.  Introduction 

The  problem  of  developing  an  efficient  method  of 
feature  extraction  is  one  of  the  most  significant  ones  in 
the  field  of  pattern  recognition.  One  to  its  importance, 
feature  extraction  has  recieved  much  attention  in  the 
literature.  Good  reviews  of  this  subject  can  be  found  in  (2) 
and  I 3)  . 

However  most  of  the  methods  for  feature  extraction 
which  exist  have  been  developed  in  the  context  of  fairly 
rigidly  defined  problems.  As  a result,  relatively  little 
work  has  been  done  in  developing  general  approaches  to 
feature  extraction  which  are  not  problem  dependent. 

The  general  approaches  for  dimensionality  reduction 
which  have  been  devised  tend  to  be  based  upon  information 
theoretic^,  and  statistical  foundations.  In  particular  many 
such  methods  seek  to  minimize  the  probability  of  error  or 
some  bound  on  the  probability  of  error.  Such  approaches 
usually  assume  that  a normal  probability  density  function 
underlies  each  class.  In  practice,  such  an  assumption  may 
only  be  paitially  true.  In  ether  cases,  even  if  the 
underlying  probability  density  functions  were  normal,  it  is 
virtually  impossible  to  retrieve  reliable  exti  mates  of  the 
class  conditional  statistics  given  a lirited  number  of 
training  samples.  Conseguently  application  of  general 
feature  extraction  techniques  based  solely  on  information 
theoretic  or  statistical  grounds  is  often  inappropriate. 
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Other  general  metliods  of  feature  extraction  based  on 
Karhunen-Loeve  theory  have  been  developed  (<4)  . However  these 
methods  require  an  estimate  of  the  lu  mped  covariance  matrix. 
If  one  i£  given  only  a small  number  of  training  samples  from 
a measurement  space  of  high  dimensionality,  a reliable 
estimate  of  this  covariance  matrix  cannot  be  made.  Therefore 
in  this  case,  feature  extraction  technigues  based  upon  the 
Karhunen-LOeve  expansion  are  inappl icable- 

Wha t is  needed  is  a feature  extraction  technique  which 
meets  the  following  requirements.  First,  it  should  be 
general  in  nature,  that  is,  it  should  not  be  limited  to 
application  on  only  a certain  type  of  data.  Secondly,  the 
feature  extraction  should  be  applicable  to  situations  where 
the  ratio  of  the  number  of  training  samples  to  the  dimension 
of  the-'measurement  space-is  small.  As  a result,  this  method 
should  not  be  hindered  by  requiring  a complete  knowledge  of 
the  underlying  class  conditional  probability  density 
functions.  Also,  the  feature  extraction  method  should  be 
computationally  efficient. 

In  tlie  present  report,  we  formulate  a new  approach 
to  nonlinear  feature  extraction  which  meets  the  above 
requirements.  The  proposed  method  is  based  on  optimally 
preserving  certain  graph  tlieoretical  and  topological 
attributes  present  in  a data  set.  In  the  following  sections, 
the  mathematical  groundwork  for  this  feature  extraction 
method  is  laid  and  the  attributes  which  constitute  structure 
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in  a data  set  are  studied-  The  feature  extraction  algorithm 
is  then  forrulated  and  some  numerical  results  obtained  from 
the  application  of  this  technique  on  various  data  sets  are 
qi ven- 

2-  Kathe  tratical  Groundwork 

Feature  extraction  essentially  amounts  to  finding  a 
transformation  from  one  space  called  the  measurement  space 
to  one  called  the  feature  space.  Observations  in  the 
measurement  space  can  be  thought  of  as  resulting  from  the 
conversion  of  some  physical  excitation  into  raw  data  by  some 
sensing  device  or  system.  The  feature  extractor  then 
transforms  this  raw  data  into  a set  of  variables  called 
features  which  hopefully  provide  enough  information  to  allow 
for  correct  classification.  The  block  diagram  of  a typical 
pattern  recognition  system  is  given  in  Fig.  1. 

At  this  point,  we  shall  state  the  general  problem 
which  this  paper  addresses-  Let  there  be  given  a set  of  N 
data  vectors  V.  - (Xj  , X2 , • - - Xj^j)  where  each  X^is  a vector 
which  belongs  to  a real  measurement  space  of  dimensionality 
n.  Ve  wish  to  find  a corresponding  set  of  vectors 
Y = (Yj  , Y2  , - - - ) where  each  a vector  belonging  to  a 

real  feature  space  of  dimension  m-  We  require  that  m be  an 
integer  satisfying  1<  m<  n.  He  wish  that  Y be  constructed  in 
such  a manner  that  the  structure  present  in  X is  optimally 
preserved  under  the  mapping  from  X to  Y-  That  is,  we  wish 


that  the  napping  take  place  with  the  least  degradation  of 
structure-  Obviously,  the  question  arises  as  to  what 
constitutes  structure  in  a data  set-  This  paper  shall 
address  this  question  later-  For  the  time  being,  we  shall 
summarize  the  way  in  which  other  authors  have  approached  the 
general  nonlinear  feature  extraction  problem  posed  earlier. 


3-  Background 

In  (5),  Saramon  developed  a nonlinear  mapping  technique 
for  data  structure  analysis-  In  his  algorithm,  Samraon 
optimized  a criterion  functional  based  on  interpoint 
distance  preservation  using  a steepest  descent  method.  If 

represents  ■ the  Euclidean  distance  between  measurement 

« 

vectors  X and  Xj  and  dj^j  represents  the  Euclidean  distance 
between  corresponding  feature  vectors  a nd  Yj  ; then  the 


mappi ng 

criteria  is  defined  as: 
Q(Y)  = lV(dii  -d^  )2 

« i<3 

(3.1) 

where 

(3.2) 

and 

XER®  and  YG  R". 

I n 

general,  structure  preservation 

in  this  a ppr  oach 

is  maintained  by  fitting  the  N points  in  the  feature  space 

such  that  their  interpoint  distances  best  approximate  the 

corresponding  interpoint  distances  in  the  measurement  space. 

T 

Jf  y , examination  of  (3.1)  shows  that  the 

criterion  functional  is  based  upon  the  (N  x.  m)  variables  Yjiji 


i = l , . . . N : i=  1 


. . m. 
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In  certain  cases,  the  product  (Nxm)  may  be  so  large 

that  optimizing  (3-1)  would  be  prohibitive  in  terms  of 

computat ional  complexity-  As  a result,  Chang  and  Lee 
presented  a lieuristic  relaxation  method  for  the  solution  of 
the  nonlinear  feature  extraction  problem  in  (6).  Other 
researchers  in  the  area  include  Shepherd  (7),  Calvert  (8), 
and  White  (9)- 

In  all  the  above  Cases,  the  same  criterion  functional 
(3-1)  is  used.  In  other  words,  interpoint  distance  is 

treated  as  the  only  depository  of  information  about  a data 

set-  This  description  of  structure  in  a data  set  seems 
somewhat  incomplete.  It  seems  that  further  examination  into 
what  constitutes  structure  in  a data  set  should  be  made.  In 
the  following  section  we  do  just  this. 

h-  Structure  in  a Data  Set 

When  the  situation  arises  that  one  is  confronted  with 
analyzing  data  from  a high-dimensional  space  under  the 
handicap  of  having  only  a small  number  of  observations,  he 
often  resorts  to  cluster  analysis  to  learn  something  about 
the  structure  of  the  data-  Cluster  analysis  has  been  studied 
by  a number  of  authors.  For  good  reviews  on  this  subject, 
one  may  refer  to  (10),  (11),  and  (12). 

Let  us  consider  the  problem  of  clustering  a set  of  n 
objects  Z = (Z| , Z2  , . - - , into  C clusters  ,W2*  - - - 
■ay  think  of  clustering  as  no  more  than  the  process  of 


T 


6 


partitioning  H ob-jects  into  C mutually  exclusive  groups. 
Similarly,  clustering  can  be  thought  of  as  merely  a process 
of  assigning  one  of  C membership  labels  to  each  of  the  M 
oh  ject  s. 

One  laige  category  of  clustering  algorithms  is  that  of 
hierarchical  clustering.  Hierarchical  clustering  consists  of 
a number  of  different  methods  which  are  seguential  in 

I 

nature.  Such  methods  are  fully  described  in  (13). 

The  two  ma-jor  types  of  hierarchical  clustering  are 
a qg  1 o mer  at  i ve  and  divisive  hierarchical  clustering.  in 

a qq  1 o mer  at  i V e hierarchical  clustering  schemes,  one  begins  by 
placing  each  of  the  H objects  into  M singleton  clusters.  At 
the  next  stage  of  the  procedure,  a new  partition  is  obtained 
by  joining  the  two  "closest”  clusters  into  one  cluster,  thus 
diminishing  the  total  number  of  clusters  by  one.  Here, 
"closest"  depends  on  how  one  defines  the  distance  between 
two  clusters-  This  process  is  repeated  until  the  desired 
number  of  clusters  is  obtained.  Divisive  hierarchical 
clustering  is  similar  except  that  one  begins  by  placing  all 
H objects  into  one  cluster.  Then  at  each  stage  of  the 
procedure,  one  cluster  is  split  into  two  clusters. 

Regardless  of  tlie  method  employed,  the  way  one  defines 
the  distance  between  two  clusters  is  critical.  Two  of  the 
most  coirmonly  used  measures  of  distance  between  clusters  are 
dmin(W  ,W')  = d(  i<  ) (4.1) 

, w'  ) » majc  d ( o(  , ^ ) (4.2) 

oltW.^feW 
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where  K and  are  clusters  and  d(*,«)  represents  some 

distance  measure  such  as  Euclidean  distance 

Consider  the  use  of  d • as  a measure  of  distance 

rain 

between  clusters  in  an  agglomera ti ve  hierarchical  scheme-  We 
may  think  of  the  data  points  as  nodes  of  some  graph-  A 
review  cf  graph  theory  can  be  found  in  As  a result  of 

()•  ■ ng  ^niin  nearest  neighbors  determine  the  nearest  or 

I 

sest”  subsets  or  clusters.  Suppose  that  we  represent  the 
iger  of  two  clusters  V and  by  adding  an  edge  between  the 
nearest  pair  of  nodes,  one  in  W and  the  other  in  9*.  Since 
ve  are  aivays  goininq  distinct  clusters,  the  graph  resulting 
from  these  edge  additions  will  contain  no  loops  or  circuits- 
Such  a qraph  is  termed  a tree-  If  the  process  of  adding 
edges  continues  until  all  nodes  have  at  least  one  edge 
connected  to  them,  the  resulting  graph  is  termed  a spanning 
tree-  It  can  be  proven  that  that  if  at  each  stage  of  the 
algorithm,  the  nearest  pair  of  clusters  are  merged  then  the 
SUB  of  the  edge  lengths  for  this  spanning  tree  will  never  be 
greater  than  the  sum  of  the  edge  lengths  of  any  other 


spanni ng 

t ree. 

Th  us 

th  i s 

graph  is 

said 

to 

be  the  mi  ni  mal 

spanning 

t r ee 

(MST) 

for 

the  data 

set 

and 

the  clusterinq 

procedur  e 

is 

ca  11  ed 

the  nearest  neiqhbor 

or 

single-linkage 

alqorith  a. 

When  the  measure  of  distance  is  incorporated  in  an 

a qq lo mer ati ve  scheme,  we  obtain  the  so-called  furthest 
neiqhbor  clusterinq  algorithm-  This  algorithm  also  has  a 
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graph  theoretical  analogy.  At  each  stage  of  the  hierarchy, 
we  produce  a graph  consisting  of  a number  of  complete 
subgraphs  representing  the  clusters.  A complete  subgraph  is 
one  in  which  each  pair  of  nodes  is  connected  by  an  edge- 
Referring  to  the  definition  of  we  deduce  that  the 

distance  between  two  distinct  clusters  is  determined  by  the 
length  of  the  edge  connecting  the  most  distant  pair  of 

I 

nodes-  This  guantity  is  referred  to  the  diameter  of  the 
union  of  the  two  clusters. 

As  was  seen  previously,  the  minimal  spanning  tree  is 
intimately  connected  with  the  single-linhage  clustering 
algorithm.  The  MST  is  a deceptively  simple  structure  which 
was  first  introduced  by  Prim  in  (15).  Zahn  in  (18)  was  the 

t 

first  to  demonstrate  its  amazing  powers  in  handling 
clustering  problems  which  had  previousl y . def i ed  solution.  He 
showed  that  clusters  in  a two-dimensional  space  which  the 
eye  identified  immediately  as  separate  entities  could  be 
separated  trivially  by  an  algorithm  based  on  the  KST. 

An  important  attribute  associated  with  the  MST  is  that 
of  edge  i nconsi st ency-  The  inconsistency  of  an  edge  of  the 
MST  is  defined  in  the  following  manner.  Suppose  that  an  edge 
of  the  MST  connects  points  y.'  and  a".  The  i nco  ns  i st  e nc  y 
measure  of  the  edge  is  defined  as  the  ratio  of  the  length  of 
the  edge  connecting  y/  and  X^divided  by  the  average  length  of 
all  edges  connecting  either  or  , but  not  both-  Zahn 

states  that  when  the  value  of  edge  inconsistency  exceeds  a 


9 


value  of  2 then  the  huircan  eye  tends  to  perceive  the 
components  connected  l,o  the  endpoints  of  the  edge  as 
separate  entities.  In  fact,  Zahn  states  that  the  MST  is  the 
fundamental  mechanism  explaining  proximity  and  Gestalt 
effects  in  psycholoqy. 

The  main  properties  of  the  MST  can  be  summarized  as 
f ol lovs : 

1-  Any  node  is  connected  to  at  least  one  of  its  nearest 
neighbors. 

2.  Any  subtree  is  connected  to  at  least  one  of  its 
nearest  neighbors  by  the  shortest  available  path. 

3.  The  MST  minimizes  all  increasing  symmetric  functions 
of  inteipoifit  distance. 

h.  The  MST  connectivity  is  invariant  under  any  mapping 
which  prejjerves  the  rank  order  of  interpoint  distances. 

5.  The  MST  is  easy  to  compute  and  it  resembles  a 
loopless  skeleton  of  the  configuration. 

In  liq.  2,  the  minimal  spanning  tree  for  a set  of  2- 
dimensional  data  points  is  given. 

In  the  following  section,  we  incorporate  the  concepts 
of  the  MST,  cluster  diameter,  and  edge  inconsistency  into  a 
method  f cr  nonlinear  feature  extraction. 

5-  Feature  Extraction  Procedure 

For  convenience,  we  shall  now  restate  the  feature 


extraction  problem: 
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Let  there  be  given  a set  of  points  >.  = (X  j , >2  . - - - ^ ) 
where  each  - We  wish  to  find  a corresponding  set  of 

points  y where  each  Y£R  (1<n<n)  in  such  a 

raanner  tliat  the  structure  contained  in  the  data  set  X is 
optimally  preserved  under  the  transformation  g:  R^K*^which 

maps  X into  Y.  Due  to  the  one-to-one  correspondence  between 
sets  X and  Y,  we  take  the  liberty  of  expre  sing  g (X,  ) as  y.  . 

I ^ 

Ve  tegin  the  feature  extraction  procedure  by  computing 
the  N(N-1)/2  independent  inter  point  distances  for  the  set  X. 
Once  this  is  done,  the  minimal  spanning  tree  for  the  data 
set  can  be  constructed.  After  this  is  done,  we  compute  the 
edge  inconsistency  measure  for  each  of  the  (N-1)  edges  of 
the  tree.  These  values  for  edge  inconsistency  are  stored  in 
an  array  E of  dimiension  (K-l). 

In  addition  to  array  E,  we  introduce  the  array  C which 

shall  be  used  in  the  feature  extraction  process.  Since  we 

may  think  of  clustering  as  merely  a method  of  labelling 

points  according  to  cluster  ni  embersh  i p , we  can  store  such 

. th 

membership  labels  rn  the  array  C.  The  i component  of  C, 
denoted  Cj^,  will  contain  the  cluster  membership  of  point  X^. 
Initially,  we  place  all  points  of  X into  cluster  H so 
for  all  values  of  j initially. 

Another  array  B of  dimension  N is  utilired  in  the 
procedure.  Since  the  feature  extraction  process  proposed  is 
a sequential  one,  it  is  necessary  to  keep  track  of  which 
vectors  have  been  transformed.  We  use  the  array  B to  do 
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this-  He  initially  set  all  the  components  of  B equal  to 
0,  to  reflect  the  fact  that  none  of  the  vectors  have  been 
transformed-  As  we  transform  a vector,  say  we  set  B^^to  1 
and  it  lemains  at  that  value  for  the  remainder  of  the 
procedu r c- 

To  teqin  the  feature  extraction  process,  we  search  the 
array  E to  find  the  KST  edge  with  the  largest  value  of  edge 
inconsistency-  The  endpoints  of  this  edge  are  determined  and 
are  denoted  by  xj^and  - At  this  time,  the  diameter  edge  is 
found  and  the  endpoints  of  this  edge  are  denoted  by  X^and 

- Once  the  identity  of  these  four  points  is  known,  ve 

^ 1 

group  them  into  what  is  called  the  active  set.  A-  This  set, 

1 1111 

A ^ termed  the  active  set  because  it 
contains  the  vectors  whose  images  , a n d Y^are  to  be 
located  q^p_timally  in  the  feature  space-  The  superscript 
indicates  that  we  are  at  the  first  stage  of  the  feature 
extraction  process- 

Once  the  active  set  membership  is  determined,  we  update 
the  array  B by  setting  B^fo  1 for  i n A^-  He  then  proceed 
to  find  the  feature  or  image  space  configuration 


q (A^  1 


criterion. 


(Y^  , Y^  , Yj^  , Y^  ) which  minimires  the  following 


where 


Q^(  g(A^))  = y y ) A(i-Bj)  (5.1) 

=(  i I , (5.2) 


d . . is  the  distance  between  and  X 

^ J 


is  the  distance 


Y and  Y^  , andA(')  is  tlie  standard  Kronecker  delta 


bet  ween 
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f unct ion  - 

The  above  (5-1)  in  the  standard  Sanmion  criterion  except 
that  it  is  ap{»lied  to  onl,  the  fo>ir  data  points  of  the 
active  sot-  In  addition  ve  do  not  require  that  the  distance 
metric  be  Euclidean-  The  optimization  of  (5.1)  may  be 
carried  out  by  an  appropriate  multivariable  optimization 
technique- 

Before  beqinninq  t.he  next  staqe  of  the  algorithm,  we 
must  update  the  cluster  membership  vector  C-  Suppose  that  we 
delete  the  most  inconsistent  edge  from  the  MST-  The 
resultii\q  graph  would  then  consist  of  two  connejted 
subgraphs,  representing  two  clusters.  We  arbitrarily  take 
one  of  these  clusters  and  relabel  it  as  W£  - To  reflect  this, 
we  set  to  2 for  all  i such  that  Xj  is  in  the  selected 

cluster  Vp  Of  . course,  the  vectors  contained  in  the  other 
subgraph  remain  in  cluster  and  there  is  no  need  to  update 
the  components  of  C relating  to  them. 

In  general,  we  repeat  the  above  procedure  until  all 
points  in  X have  their  images  mapped  into  Y or  until  the 
value  of  edge  inconsistency  fails  to  exceed  2-  When  the 
value  of  edge  inconsistency  fails  to  exceed  2,  then  we  know 
that  the  edge  under  consideration  is  not  an  edge  separating 
what  the  eye  would  perceive  as  two  distinct  entities  or 
clusters-  As  a result,  we  modify  our  approach.  This 

modification  will  be  discussed  later. 

To  formulate  the  general  approach,  suppose  that  we  are 


at  the  K stage  of  the  algorithm.  We  conduct  a search  of  the 

th 

array  E to  find  the  K most  inconsistent  edge  of  the  rtST- 
Suppose  that  this  edge  has  a valoe  of  inconsistency  greater 
than  2.  Furthermore  assume  that  this  edge  is  contained  in 
cluster  Rp  and  that  it  has  endpoints  xj^and  . We  then  find 

the  diameter  edge  of  cluster  Kpand  determine  its  endpoints  X 

K . K K K K 

and  - From  these  four  points  1^,  X^^ , X^ , and  X^ , we  find 

the  points  which  h'ave  not  yet  had  their  images  located  in 

the  feature  space  and  we  place  these  points  in  the  active 


set  A 


then  update  the  B vector  by  setting  B^to  1 for 


each  X contained  in  the  active  set  A . »e  then  find  the 

K 

configuration  g(A  ) which  minimizes  the  following  criterion 


functional.  K JL  Z 

qK(  g(  a ))  . ^ (5.3) 

lei  j=l 

( i I A^)  (5.4) 

and  d 4 i and  d . • are  as  defined  liefore.  ' 

w 1 j 

Once  the  minimization  is  performed  the  cluster 


oembeiship  vector  C is  updated  by  the  method  described 

8t 

earlier  and  the  (K+1)  stage  of  the  algorithm  is  begun- 

If  the  value  of  edge  inconsistency  fails  tc  exceed  2 at 
"t  h 

at  the  K stage  of  the  algorithm,  we  realize  that  such  an 
edge  does  not  connect  what  would  readily  be  perceived  as  two 
distinct  entities.  As  a result,  we  would  like  to  stress 
intracluster  relationships  at  this  stage.  To  reflect  this. 


we  make  a modification  on  the  criterion  functional  expres  ed 
in  (5.3).  Instead  of  (5-3)  , we  minimize  the  following 


Bodified  crj.t&rion  functional: 

Q^^(g(A*^))=  (d^i-d^  ^)^A(l-BQA(Cj-P)  (5.5) 

K 

wheie  all  quantities  3 , ^ defined  before. 

Tlie  modified  criterion  functional  stresses  intracluster  over 

intercluster  relationships  by  virtue  of  the  second  Kronecker 

th 

delta  terrc.  Pecall  that  the  K most  inconsistent  edge  fell 

in  cluster  Wp  . Therefore  we  concern  ourselves  only  with 

i Dt  er  poi  nt  distances  among  members  of  the  active  set  and 

members  of  Wpalready  located  in  the  image  space.  In  this  way 

we  emphasise  intracluster  structure  and  at  the  same  time 

simplify  the  optimization  problem.  After  (5.5)  is  minimized 

s t 

we  return  to  the  |K*1)  stage  of  the  algorithis  and  repeat. 

A flowchart  for  the  sequential  nonlinear  structure 
preserving^  feature  extraction  algorithm  is  presented  in 
Fiq-3-  One  maior  advantage  of  this  method  over  "other 
existing  methods  is  that  at  any  stage  of  the  algorithm  we 
are  concerned  with  optimizing  a criterion  which  is  dependent 
on  at  most  (hxm)  variables.  This  greatly  reduces  the 
opt  i mi  za  t i ona  1 complexity  of  the  problem. 

The  algorithm  has  been  programmed  in  FORTRAN  at  the 
Institute  for  Computer  Services  and  Applications  at  Kice 
University.  Some  numerical  results  obtained  by  applying  this 
method  to  several  different  data  sets  are  presented  in  the 


next  section. 
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6-  Numerical  Results 

The  feature  extract  on  alqorithm  was  applied  to  three 
different  data  sets  to  test  its  utility.  In  each  case,  the 
dimension  of  the  feature  space  was  chosen  to  be  2 so  that 
graphical  displays  of  the  tranforraed  data  could  be  obtained. 
The  test  data  sets  were  as  follows: 

1.  Artificially  generated  data:  Fifteen  points  in  a 

I 

4-dimensional  space  were  generated  from  three  Gaussian 
pattern  classes.  Each  pattern  class  shared  the  same 
covariance  matrix  (the  identity  matrix).  The  means  of  the 
pattern  classes  were  located  at  the  vertices  of  an  isosceles 
triangle.  The  resulting  transformed  configuration  is  shown 
in  Fig.  4. 

2-  Nonlinear  data:  This  data  set  consisted  of  30  points 
distributed  along  a nonlinear  curve  in  a 5-dimensional  space 
The  parametric  eguations  governing  this  curve  were 


U,  =cos  Dg 


Dj^si  n 


Dj  =0. 5 cos  20 j 
0^  ~0.  5 sin  2Vg 
U^=.7  07  t 


(6.1) 

(6.2) 

(6.3) 

(6.4) 


(6.5) 

where  t=  0,1,.. .29.  This  data  set  was  found  in  5 - The 
resulting  2-space  configuration  is  shown  in  Fig.  5. 

3.  Oltraviolet  fluorescence  spec trag ra phi c data:  Sixteen 
ultraviolet  fluorescence  spectra  representing  3 classes  of 
oils  (crudes,  diesels,  and  no. 6 fuel  oils)  were  sampled  at 
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15  wavelengths.  The  resulting  vectors  were  mapped  into  a 
2-space  as  indicated  in  Pig.  6, 

All  computer  experiments  were  conducted  on  an  IBM 
370/155  general  purpose  computer-  The  feature  extraction 
algorithm  reguires  60  Kbytes  of  memory-  In  each  case. 
Euclidean  distance  was  used  as  the  distance  metric. 

Examination  of  Fig-  4 reveals  that  the  three  Gaussian 
classes  are  fairly  well  separated.  Similarly,  Fig.  6 
indicates  a good  separation  of  the  three  classes  of  oils.  In 
Fig-  5,  we  clearly  see  the  string-like  structure  present  in 
the  nonlinear  data  set.  Although  in  all  examples  the  feature 
space  was  2-dimensional,  there  is  no  restriction  that  this 
need  always  be  the  case- 

7-  Conclusions  j - - - - 

An  algorithm  for  nonlinear  feature  extraction  has  been 
presented.  This  algorithm  emphasizes  structures  from  graph 
theory  such  as  the  minimal  spanning  tree,  inconsistent 
edges,  and  diameter  edges  as  important  attributes  to  be 
preserved  under  transformation-  The  process  is  seguential 
and  hierarchical  in  nature  thus  easing  the  computational 
complexity  encountered  with  other  nonlinear  feature 
extraction  algortihms-  Since  the  algorithm  was  formulated  in 
such  a manner  that  stressed  structural  properties  present  in 
data  sets,  the  incorporation  of  this  method  in  an 
interactive  pattern  recognition  system  would  be  interesting. 
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Encouraqinq  results  from  the  application  of  this 
alqorithm  to  various  data  sets  were  presented.  Currently, 
application  of  this  method  to  other  data  bases  is  being 
performed.  The  reults  from  this  effort  should  appear 


shortly. 
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